A Morphological Processor Based on Foma for Biscayan (a Basque dialect)

نویسندگان

  • Iñaki Alegria
  • Garbiñe Aranbarri
  • Klara Ceberio
  • Gorka Labaka
  • Bittor Laskurain
  • Ruben Urizar
چکیده

We present a new morphological processor for Biscayan, a dialect of Basque, developed on the description of the morphology of standard Basque. The database for the standard morphology has been extended for dialects and an opensource tool for morphological description named foma is used for building the processor. XuxenB, a spelling checker/corrector for this dialect, is the first application of this work. 1. Basque and Biscayan morphology Basque is an agglutinative language with rich morphology. Standard Basque morphology was described by Alegria et al. (1995; 2002) using finite-state morphology. The Biscayan dialect of Basque (Arejita et al., 2002, 2005), also called Western Basque (Zuazo, 2008), is a dialect of the Basque language spoken in the western part of the Basque speaking area, mainly in the province of Biscay, but also in southwest Guipuzcoa and the Basque speaking areas of Álava. Although it is the most widespread of Basque dialects, it differs considerably from standard Basque, heavily based on the Guipuzcoan dialect. While the standard written Basque is used in all the levels of education and the media intended for the whole Basque speaking population, there is an increasing interest in using Biscayan in local media and informal communication forums (blogs, chats, phones). It is quite difficult to calculate the number of Biscayan speakers, as the majority of surveys done regarding the use of Basque language focus mainly on Basque speakers as a whole. Nevertheless, according to the estimate made by the Labayru institute (www.labayru.org), a cultural institution working for the promotion of Biscayan, speakers of this dialect amount to about 250,000.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Porting Basque Morphological Grammars to foma, an Open-Source Tool

Basque is a morphologically rich language, of which several finite-state morphological descriptions have been constructed, primarily using the Xerox/PARC finite-state tools. In this paper we describe the process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, provide a comparison of the two tools, and contrast the ...

متن کامل

Using foma for language-based games

This paper describes two examples of how finite-state technology (FST) commonly used in computational morphology can help implement language-based games. The tool we have used is foma an open-source toolkit, similar to previous Xerox/PARC finite-state tools. FST tools have been widely used to describe the morphology of languages and to implement spelling checkers and correctors, especially for ...

متن کامل

Developing an Open-Source FST Grammar for Verb Chain Transfer in a Spanish-Basque MT System

This paper presents the current status of development of a finite state transducer grammar for the verbal-chain transfer module in Matxin, a Rule Based Machine Translation system between Spanish and Basque. Due to the distance between Spanish and Basque, the verbal-chain transfer is a very complex module in the overall system. The grammar is compiled with foma, an open-source finitestate toolki...

متن کامل

Morfología de estados finitos en software libre: aplicación al euskera

In this paper we describe the process of conversion and testing of the description for the Basque morphology from the Xerox toolkit to foma, a new open-source tool.

متن کامل

Globalization, Standardization, and Dialect Leveling in Iran

This paper is an attempt to shed light on the effects of modernization, urbanization, monolingual educational system, and mass media as well as the process of globalization on dialect leveling among Persian dialects. In so doing, the first part of the paper elaborates on the relationship between globalization and sociolinguistics, and on the concept of standardization. Also, it discusses some ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010